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Naziv izuma 

Sintetski gen za humani granulocltne kolonlje stimulirajoci dejavnik za 
ekspresijo v E. coli 

Podrodie tehnike 

Predlozeni izum se nana§a na sintetski gen za humani granulocltne kotonije 
stimulirajoQ dejavnik (hG-CSF), ki omogoSa ekspresijo v £. coli z nivojem ekspresije, 
ki je enak ali vl§jl od 52% rekombinantnega hG-CSF glede na celotne prbteine po 
ekspresiji. 

hG-CSF uvrSaamo med kolonije stimuilrajo5e dejavnike. ki regulirajo 
diferenciacijo in proliferacljo hematopoetskih ceiic sesalcev. Imajo odlo5ilno viogo pri 
tvorbl nevtrofllcev, zato so primeml za uporabo v medlcini na podrocju hematologije 
in onkologije. 

Za kllnlCno uporabo sta danes na trgu dve obliki: lenograstim, ki je glikozillran, 
in ga pridoblvajo z ekspresijo v sesalski celi5nl linljl, ter filgrastim, ki je neglikoziliran, 
In ga pridoblvajo z ekspresijo v bakteriji Escherichia coli (E. coii). 

Bistvo predfozeneaa izuma 

Bistvo predlozenega izuma je, da je mogoSe s sintetskim genom, kl kodira za 
hG-CSF, doseci nivo ekspresije (akumulacijo) v £. coli, ki je enak ali visji od 52% 
rekombinantnega hG-CSF glede na celotne protelne v E. coli. Pri ekspresiji se 
uporabi ekspresljskl plazmid z mofinim T7 promotoiiem. Sintetski gen za hG-CSF se 
prlpravi s kompleksno komblnacijo dveh postopkov, kl omogoCata pripravo 
optimlzlranega sintetskega gena za hG-CSF za ekspresijo v E. coli. Prvi postopek Je 
ta, da se na nekaterih mestih zamenja neugodne kodone za ekspresijo v £. coli z bolj 
ugodnlml kodoni za E. coli. Drugi pa, da se na nekaterih mestih zamenja GC bogate 
regije z AT bogatimi regijaml. Na nekaterih mestih se sintetski gen za G-CSF, kl je 
predmet izuma, prlpravi tako, da se uporabi ena od obeh metod, na nekaterih mestih 
kombinacija obeh navedenih metod, nekatera mesta se pa ne spremenljo. Pri 
pripravi sintetskega gena za hG-CSF, ki je tudi predmet izuma, ne spremlnjamo reglj, 
kl so izven kodirajoSega podro5ja. Tako ne uvajamo sprememb v podrodju inlciacije 



translacije (TIR), v podroSju vezavnega mesta za ribosom (RBS) in v podroCju 
razdalje med start kodonom in RBS. 

Stanie tehnike 

Vpliv ve6 zaporednih neugodnlh kodonov, kot so argininiski (AGG/AGA; CGA), 
leucinski (CTA), izoleucinski (ATA) all prolinski (CCC), na uspeSnost translacije in s 
tern zmanjganje koll6lne all kvalltete nastalega proteina izra^enega v E. coll, je 
opisan v Kane JF, Current Opinion in Biotechnology, 6:494-500 (1995). Podoben je 
vpliv posameznili neugodnih kodonov, ce se pojavljajo na razliSnih mestih. 

Vpliv na uspeSnost transladje v E. coU Imajo tudi GC bogate regije, Ce pride 
zaradi njih prl sekundami struklurl mRNA do nastanka stabilne dvoveri^ne .R^4A. 
Vpliv je najveSji, kadar so GC bogate regije mRNA na mestu, kjer se ve2e ribosom, v 
neposredni bli2ini vezave ribosoma ali v neposredni bli^ni start kodona (IVIakrides 
SC, Microbiological Reviews, 60:512-538 (1996); Baneyx F, Current Opinion in 
Bioteciinology, 10:411-421 (1999)). Znanlh je ve5 metod ocenjevanja sekundarne 
strukture In iana5unavanja minimalne proste energije posamezne RNA molekule, kar 
naj bi bllo osnovno merllo za najbolj stabiino oziroma najbolj verjetno strukturo 
(SantaLucia J Jr in Turner DH, Biopolymers. 44:309-319 (1997)). Zanesljivi algoritmi 
za napoved prave sekundarne strukture. razen v nekateriii primerih §e niso poznani; 
prav tako se ni mo2no dokazati kvantitatlvne relaclje z nivojem ekspresije (Smit MH 
in van Duin JJ. Uo\. Biol.. 244, 144-150 (1994)). Trodimenzionalnih struktur protelnov 
tudi se ni mozno predvideti (Tinoco I In Bustamante C, J. Mol. biol, 293:271-281 
(1999)), 

Povedanje nivoja ekspresije z optimizacijo DNA zaporedja v TIR, RBS in 
razdalje med start kodonom In mestom RBS je opisano v McCartfiy JEG In 
Brimacombe R, Trends Genet 10:402-407 (1994). Vzrok za poveCanje nivoja 
ekspresije v tern primenj je bolj uClnkovIt priCetek translacije In tekoSega 
nadaljevanja v kodlrajode podrocje mRNA. 

Pridobitev dovolj velikih koliCin hG-CSF za In vitro bioloSke Studije z ekspresljo 
V E. CO// je opisana v Souza LM et al. Science 232:61-65 (1986) In v Zsebo KM et al, 

Immunoblology 172:175-184 (1986). Dose2en nivo ekspresge hG-CSF je bil manjsi 
od 1%. 



V patentu US4810643 je opisana uporaba sintetskega gena za hG-CSF, ki je 
bil skonstruiran predvsem na osnovj zamenjave neugodnlh kodonov s kodoni, kl so 
za E. ooli optimalni. V komblnaciji s termolnduclbllnfm promotorjem iz lambda faga je 
bil do8©2en nivo ekspreslje hG-CSF od 3 do. 5% glede na celotne celiane proteine, 
kar ne omogoga ekonomiene proizvodnje v industrijskem merllu. 

8-10% akumulacljo hG-CSF glede na celotne celidne proteine so dosegll s 
spremembo prvih stirlh kodonov v 5' podrodju gena za hG-CSF, kot je opisano v 
WIngfield P etal, Biochem. J, 256:213-218 (1988). 

Ekspresija hG-CSF v E. coli z Izkoristkom do 1 7% hG-CSF glede na celotne 
bakterijske proteine je opisana v Devlin PE et al, Gene 65:13-22 (1988). Tak 
Izkoristek so dosegli z deino optimlzacljo DNA zaporedja na 5' koncu gena (kodoni za 
prv© Stiri aminokisline), pri 6emer so spremenlll GC v AT bogato regijo In z uporabo 
relatlvno modnega promotorja fz faga lambda In optimlzaclje DNA zaporedja na 5' 
koncu gena (kodoni za prve §tlrl aminokisline). NIvo ekspreslje ni zelo visok, kar 
prispeva k slabSim izkoristkom pridoblvanja In manj§o ekonoml5nost v industrijskem 
merilu. 

Uporaba sintetskega gena in okoli 30% ekspresija sta opisani v Kang SH et al,' 
Biotechnology letters, 17(7):687-692 (1995). Ta nivo so dosegli z uvedbo za E. coli 
ugodnih kodonov, z modifikacijami v TIR In z dodatnimi modifikacljami setov 
kodonov, pri tern da niso bistveno spremlnjali 3' konca gena. Za dosego navedenega 
nivoja ekspreslje so bUe potrebne spremembe gena za hG-CSF tudi v TIR, pri tern da 
nIvo ekspreslje ni presegel 30%. 

V patentu US5840543 je opisan sintetski gen za hG-CSF, kl je bil skonstruiran 
z uvajanjem AT bogatih regij na 5* konec gena in z zamenjavo neugodnih kodonov s 
kodoni, ugodnlml za £. polL Pod kontrolo Trp promotorja so dosegli ekspresijo z 
izkoristkom 11% hG-CSF glede na celotne celiene proteine. Z dodatkom leucina in 
treonina all njune kombinacije v fermentacijski medij, v katerem so bakterije gojene, 
pa so dosegli do 35% akumulacljo hG-CSF glede na celotne celiCne proteine. Nivo 
ekspreslje so v tem primeru poveSali z dodajanjem aminoklslin v femientacijski medij, 
kar predstavija dodatni stroSek v postopku pridobivanja hG-CSF in ni ekonomldno v 
industrijskem merllu. Zgolj optimizacija gena za hG-CSF pa ni omogoCala veCjega 
nivoja ekpresije hG-CSF. 
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Do sedaj najvjgja akumulacija hG-CSF glede na celotne celicne proteine je 
opisana v Jeong et al, Protein Expression and Purification 23,:311-318 (2001) In je 
48%. Dosegll so jo 2 spremembno N-termlnalnega dela gena In 2 Indukcljo 2 1 mM 
IPTG, 

V sploSnem ne obstajajo zapisi. da bi bllo mogode predvldetl, kakSen bo nlvo 
ekspreslje natlvnih humanih genov v iarakarlotsklh organlzmih, kot je npr. bakterija E. 
coll. Opisani nivoji ekspreslje so relatlvno nizkl all celo komaj zaznavnl tudi pri 
uporabi ekspresijsklh plazmidov 2 moenlmi promotorjl, kot je npr. promotor Iz lambda 
all T7 faga. Iz literature sledl, da na visoko akumuiacljo humanega protelna v E: coli 
vpllva mnogo parametrov (neugodnl kodoni all njlhovo klastriranje; podrodja z velikim 
§tevilom GC baznih parov. neugodne sekundame stmklure mRNA, nestabllna 
mRNA). 

Zaenkrat nl nobenega povsem Izdelanega pravlla, kako komblnirati kodone. 
da bo imela re2ultirajo5a mRNA za ekspresljo optlmaino sekundamo in terclarno 
strukturo. Matematldni in struktumi modell za napoved In termodlnamsko stabilnost 
sekundarnih struktur so sfcer dostopni, vender §e preveC nezanesljlvi ze pri 
sekundamlh strukturah. pri terciamlh pa jih sploh se nl. Ti trenutno IzdelanI modell 
torej ne omogoSajo napovedi, kak§en je vpliv razli5nlh kombinaclj kodonov na 
ekspresljo. 

Ne V patentni ne v strokovni literaturi ni opisanlh udlnkovftejsih naSinov za 
resltev problema nizkega nivoja ekspreslje natlvnega gena za hG-CSF v E coll 

Oois izuma 

UgotovHI smo, da lahko problem nizkega nivoja ekspreslje natlvnega gena za 
hG-CSF v E. coli re§imo s pomoSjo optimizaclje nativnega gena za hG-CSF In s tern 
pripravo sintetskega gena za G-CSF. V primerjavi z do sedaj znanlml podatki na ta 
naCIn presenetljivo dobimo bistveno vi§ji nlvo ekspreslje. 

Z Izrazom 'hG-CSF' je miSljen humani granulocitne kolonije stimulirajodi 
dejavnik, kl zajema tudi rekomblnantni hG-CSF, kl ga dobimo z ekspresljo v £. coli. 

SIntetski gen za hG-CSF. kl je predmet Izuma, smo pripravlli tako, da smo 
spremenlli nukleotidno zaporedje gena za natlvnl hG-CSF, pri demer se 
aminokisllnsko zaporedje nl spremenllo In je ostalo Identlfino natlvnemu hG-CSF. 



Predlo2enf izum nadalje zajema ekspresljo na tak naCin pridobljenega 
sintetskega gena v E. coll in nivo ekspresij© na tak naCIn pridobljenega sintetskega 
gena. 

Z izrazom 'nivo ekspreslje' je miSljen delei hG-CSF. ki ga dobimo po 
heterotogni ekspresij! gena za hG-CSF v E coli, glede na celotne protelne. ki so 
prisotnl po ekspresiji. 

Z izrazom 'heterologna ekspresija' je migljena ekspresija tistlh genov. ki so tuji 
organizmu, v katerem poteka ekspresija. 

Z izrazom 'homologna ekspresija' je migljena ekspresija tistih genov, ki so 
lastni organlzmu, v katerem poteka ekspresija. 

Z izrazom 'ugodnl kodoni' so misijeni tisti kodoni, kl jih dolofien organlzem 
(npr. E. coli) uporabija za produkcijo najved mRNA molekul. Te kodone organlzem 
uporabija za gene z visoko homologno ekspresijo. 

Z izrazom 'neugodni kodoni' so miSljeni tisti kodoni. ki jih dolocen organizem 
(npr. E coll) uporabija le za gene z nizko ekspresijo. Ti kodoni so v doloSenem 
organlzmu redko uporabljeni (nizka homologna ekspresija). 

Z izrazom 'GC bogate regije' so miSljena tista podrodja v genu, v katerih 
prevladujeta bazi gvanin (G) in citozin (C). 

Z Izrazom 'AT bogate regije' so mISIjena tIsta podrodja v genu, v katerih 
prevladujeta bazI adenin (A) in timin (T). 

Z Izrazom 'sintetski gen' je mi§ljen gen, ki se od natlvnega razlikuje po DNA 
1^ zaporedju, aminoklslinsko zaporedje pa ostane nespremenjeno. Pridobljen je s 
tehnikami rekombinantne tehnoiogije DNA. 

Z izrazom 'natlvnl gen' je miSijen gen, ki nl spremenjen s tehnikami 
rekombinantne DNA tehnoiogije. 

Z izrazom 'segment' so miSljeni posameznl dell gena. kl so na obeh straneh 
omejeni z enojnimi restrikcijsklmi mesti, katera sluzijo za subklonlranje sintetsko 
pripravljenih delov gena. 

Z Izrazom 'segment I' je miSijen 5' konec gena za hG-CSF med restrlkcljskima 
mestoma Nde I (3) in Sac I (194), t.j. 191 bp dolgo zaporedje, kl je bllo sintetizirano 
na novo. 



Z izrazom 'segment il' je misljen del gena za hG-CSF med restrikcijskima 
mestoma Sac I (194) in Apa I (309), t.J. 115 bp doig osrednji dela gena, ki je bil 
sintetiziran na novo. 

Z izrazom 'segment III' je ml§ljen del gena za hG-CSF med restrikcijskima 
mestoma Apa I (309) in Niie I (467), t.j. 158 bp dolg deia gena, kjer je z izjemo 
Arg148 in Giy150 ohranjeno nativno DNA zaporedje za hG-CSF. 

Z izrazom 'segment IV' je misljen 3' konec gena za hG-CSF med 
restrikcijskima mestoma Nhe I (467) in Baml-I I (536), t.j. 69 bp doig konSni del gena, 
ki je bil sintetiziran na novo. 

SIntetski gen za hG-CSF, ki je predmet izuma, je pripravljen s kombinacijo 
nasiednjih metod: 

• zamenjavo za E. coli neugodnih kodonov s kodoni, ki so za £. coU ugodni: 
V segmentu li (med restrikcijskima mestoma Sac I (194) in Apa i (309)) In 
segmentu IV (med restrikcijskima mestoma Nhe 1 (467) in BamH 1(536)) 

• zamenjavo nekaterih GC bogatlii reglj z AT bogatimi regijami, s tern, da 
. odpravlmo tudi za £ co// najbolj neugodne kodone, pri tern pa vecinoma ne 

uporabimo za E. coli najbolj optimalnih kodonov: v segmentu I (med 
restrikcijskima mestoma Nde I (3) in Sac I (194)). 

• popolnoma nespremenjenega nativnega zaporedja 46 kodonov (med 
Pro102 In Arg147) znotraj segmenta III. 

• odpravo dveh za £ coli neugodnih kodonov (Arg148 in Gly150) na koncu 
segmenta III. 

Optimizacija gena za hG-CSF, ki je predmet izuma, ne vkljuduje sprememb 
podrocij TIR, podrocij RBS ter podro5ij v razdaiji med start kodonom in RBS. 

Sintetski gen za hG-CSF, ki je predmet izuma, omogoCa, da se ob ekpresiji 
pridobljenega sintetskega gena za hG-CSF doseze nivo ekspreslje hG-CSF v E. co//, 
ki je enak ali vi§ji od 52%, omogoceno je tudi doseganje nivoja ekspresije okoll 55% 
ali celo okoii 60%. Visok nivo ekspresije sintetskega gena za hG-CSF, ki je predmet 
izuma, omogoca visoke izkoristke pri pridobivanju hG-CSF, hitrejSo in bolj enostavno 
6i§cenje in izolacijo heterolognega hG-CSF, lazjo medprocesno kontrolo, bolj§o 
ekonomiCnost celotnega procesa in s tem omogoCa uCinkovlto pridobivanje hG-CSF 



V industrijskem merilu. Tako pridobljen hG-CSF Je primeren za kliniCno uporabo v 
medidni. 

Priprava sintetskega gena za hG-CSF, kl je predmet izuma, se pri5ne s 
predpripravo nativnega gena za hG-CSF in plazmidov. Gen za natlvni hG-CSF je 
lahko humanega izvora, prav tako lahko uporabimo isti princip za vse gene, ki so 
homologni v tistih regijah, ki vsebujejo enojna restrikcijska mesta, katera uporabimo 
za subkloniranje na novo sintetiziranih segmentov gena. Plazmid za izvedbo 
mutageneze je bi! izbran tako, da je omogofial zaporedno uvajanje tockovnih mutacij. 
Selekcija oziroma bistvena obogatitev plazmidov z ieleno mutacijo je bila dose^ena 
8 hkratno spremembo restrikcijskega mesta, in sicer iz EcoRI v EcoRV ali obratno 
(Transformer™ Site-Directed IWutagenesis Kit (Clontech)). Gen in piazmid se 
pripravita tako, da je omogoceno tudi uvajanje mutacij s kasetno mutagenezo. 

Po predpripravi nativnega gena za hG-CSF in plazmidov se izvede 
optimizacija nativnega gena za hG-CSF in s tern priprava sintetskega gena za hG- 
CSF. Optimizacija se pricne tako, da se gen za nativni hG-CSF razdeli v §tiri (I, II, III 
in IV) segmente, ki so ali bodo po Izvedbi oiigonukleotidne mutageneze lofieni z 
enojnimi restrikcijsklmi mestl. In se v posameznih segmentih izvede spremembe. V 
posameznih segmentih se izvede spremembe zaporedja gena, v dolo5enih 
segmentih se pa gena ne spremlnja (Slika 1 ). Prednostno je tako pridobljen kon6nl 
optlmlran sintetski gen za hG-CSF sestavljen iz deino ohranjenega nativnega 
zaporedja (segment III) ter 5' in 3' kodirajoSih podrodij, ki so sintetizirana na novo 
(segmenti I, II In IV). 

Spremembe po posameznih segmentih: 
Segment I: Zamenjava za E. coU neugodnih kodonov z E. coH ugodnimi kodoni ter 
zamenjava GC bogatih reglj z AT twgatimi regijami 

Thr2 (ACC^ACA), Pro3 (CCC-»CCA). Gly5 (GGC->GGT) Pro6 (CCT-^CCA), Ala7 
(GCC->GCT), SerS (AGC-»TCT), Ser9 (TCC^TCT), Pro11 (CCC->CCG). Gln12 
(CAG-^CAA), Phe14 (TTC-^TTT), Leu16 (CTC->TTG), Lys17 (AAQ ->AAA), Cys18 
(TGC->TGT), Glu20 (GA6->GAA), Val22 (GTG ->GTT). Arg23 (AGG-^CGT). Lys24 
(AAG->AAA) Ile25 (ATC-^ATT), GIn26 (CAG-^CAA), Gly27 (GGC-»GGT), Gly29 
(GGC^GGT), Ala31 (GCG-^GCT), Leu32 (CTC-^TTA), Gln33 (CAG->CAA), Glu34 
(GAG-»-GAA), Lys35 (AAG ->AAA), Ala38 (GCC^GCA), Thr39 (ACC-*ACT), Tyr40 



(TAC->TAT), Lys41 (AAG-^AAA), Cys43 (TGC-»TGT), Hi844 (CAC^CAT), Pro46 
(CCC->CCA), Glu46 (GAG-).GAA), Glu47 (GAG->GAA), Val49 (GTG-^GTT), Leu51 
(CTC->TTA), Gly52 (GGA->GGT), His53 (CAC->.CAT), GIy56 (GGC^GGT), Ile57 
(ATC->ATT), Pro58 (CCC-^CCG), Pro61 (CCC-^CCT) 

Segment il: Zamenjava za E. coli neugodnih kodonov z E. coli ugodnimi kodoni 
Cys65 (TGC-^TGT), Pro66(CCC-^CCG), Ala69 <GCC-^GCG), Leu76 OTG-^CTG). 
L8U79 (CTC->CTG), Gly82 (GGC->GGT), Leu83 (CTT-^CTG), Phe84 (TTC-»TTT), 

« 

Leu85 (CTC-»CTG), Tyr86 (TAC-^TAT), Gly88 (GGG-^GGT), Leu89 (CTC^CTQ), 
AIa92 (GCC->GCG), GIy95 (GGG^GGC), Ile96 (ATA-^ATT), Pro98 (CCC^CCG), 
Glu99 (GAG->GAA), Leul 00 (TTG-»CTG), Glyl 01 (GGT->GGG) 

Segment III: Zamenjava dveh za E. coli neugodnih kodonov tik pred restrikcijskim 
mestom Nhel 

Arg 148 (CGG -^CGT), Glyl 50 (GGA-»GGT) 

Segment IV: Zamenjava dolgega klastra za E. coll neugodnih kodonov ob 
koncu gena z za E. coli ugodnimi kodoni 

Gln159 (CAG->CAA), SerlSO (AGC->>TCT), Phe161 (TTC-j-TTT), Glu163 
(GAG-»GAA), Van 64 (GTG^GTT), Ser165 (TCG->AGC), Tyr166 (TAC-»TAT), 
Arg 167 (CGC-»CGT), Leu 169 (GTA-^CTG), Arg 170 (CGC-»CGT), His171 
(CAG->CAT), Leu 172 (CTT-^CTG), Ala 173 (GCG^GCT), Pro175 (CCC^CCG) 

Po pripravi sintetskega gena za hG-CSF se optlmiran sintetski gen subklonira 
v kon5ni plazmidni vektor, ki je izbran iz skupine pET vektoijev (Novagen), ki 
vsebujejo mo5an T7 promoter. Prednostno se uporabi plazmidni vektor pET3a. 
Ekspresijski plazmid, ki ob tern nastane, se nato transformira v produkcijski sev, ki je 
izbran iz skupine sevov, ki nosijo v kromosomu zapis za T7 RNA polimerazo, 
prednostno v E. coli BL21 (DE3). 

Postopek se nadaljuje s pripravo inokuluma in fermentacijo. Fermentacijo se 
izvaja pri 37°C, prednostno pri 25°C. 



Akumuliran heterolognl hG-CSF se izloCa v Inkluzljskih telescih in je primeren 
za renaturacijo in za uporabo v izolacijsklh postopkili. 

Qpis slik: 

Slika 1 : Shema stopenj optimlzacije gena za hG-CSF 

SUI^a 2: a) DMA zaporedje nativnega gena za liG-CSF (GenBank: NIV1_000759) 

b) DNA zaporedje optimiranega (Fopt5) gena za hG-CSF. Poudarjeno so 
izplsane baze, ki se razlikujejo od nativnega gena 
Slika 3: a) SDS-PAGE (4 % zgo§5evalni, 15 % lo6ltveni; barvano s Coomassie 

brilliant blue) vzorcev proteinov nelnducirane in inducirane kulture 
produkcijskih sevov E. coli BL21 (DE3) z ekspresijskim plazmldorri 
pETSa pri 25^ C in 42" C. Kulture so bile gojene v LBG10/amp100 
mediju. 

Legenda: 

Nanos 1: BL21(DE3) pET3a-hG-CSF nelnduciran pri 25°C (10 ^U) (ni sledi hG-CSF) 

Nanos 2: BL21(DE3) pET3a-hG-CSF induciran z IPTG pri 25°C (10 jil) (rahia sled 
hG-CSF) 

Nanos 3: BL21(DE3) pET3a-hG-CSF nelnduciran pri 42°C (10 vA) (ni sledi hG-CSF) 
Nanos 4: BL21(DE3) pET3a-hG-CSF Induciran z IPTG pri 42°C (10 ^il) (pod 1 % hG- 
CSF) 

Nanos 5: standard filgrastim 0.3 iig za Coomassie brilliant blue 

Nanos 6: BL21 (DE3) pET3a-Fopt5 nelnduciran pri 25°C (5 ^1) (6 % hG-CSF) 

Nanos 7: BL21(DE3) pET3a-Fopt5 induciran z IPTG pri 25°C (5 ni) (nad 50% hG- 
CSF) 

b) detekcija s protitelesi (Westem blot); primama zajdja protitelesa; 
sekundama kozja protitelesa proti zaj£jim IgG konjuglrana s peroksidazo Iz hrena, 
substrat p-naftol 

Vzorci za detekcijo s protitelesi so naneseni v enakih kolldinah in enakem zaporedju 
kot pri SDS-PAGE (Slika 3a) z izjemo standarda, katerega nanos je bil 0.08 \ig. 
Slika 4: SDS-PAGE (4 % zgogaevalnl, 1 5 % loCitvenl; barvano s Coomassie 

brilliant blue) vzorcev proteinov Inducirane kulture produkcijskega seva 



E. coll BL21 (DE3) z ekspresijskim piazmidom pET3a pri 25° C. Kulture 
so bile gojene v GYSP/amp100 In LYSP/amp100 medlju. 

Legenda: 

Nanos 1: LMW (BioRad) 

Nanos 2: BL21 (DE3) pET3a/P-Fopt5, kultura gojena v LYSP/amp100; (60%. hG- 
CSF) 

Nanos 3: BL21 (DE3) pET3a/P-Fopt5, kultura gojena v LYSP/amp100; (nad 54% hO- 
CSF) 

Nanos 4: rhG-CSF (0.6 ng) 
Nanos 5: rhG-CSF (1 .5 ng) 

Nanos 6: BL21 (DE3) pET3a/P-Fopt5, kultura gojena v GYSP/amplOO (4 nl); (55% 
hG-CSF) 

Nanos 7: BL21 (DE3) pET3a/P-Fopt5, kultura gojena v GYSP/amp100 (5^1); (52% 
hG-CSF) 

Primeri: 

Primer 1 : priprava optimalneoa qena: FoptS 
Primer la: Predpriorava aena in plaajnidov 

Gen za hG-CSF gen smo namnoSil iz BBG13 (R&D) z metodo PCR, s katero 
smo z ^Cetnimi oligonukleotidi vnesii na zaCetek In konec gena tudi restrikcijski 
mesti Ndel in BamHI. Gen smo nato vklju5iil v plazmid pCytexAH.H (glej opis v 
nadaljevanju) med restrikcijski mesti Ndel In BamHI, v katerem smo tudI Izvedii vse 
stopnje optlmizacije gena za ekspresijo v £. coli. 

V predpripravl smo Iz gena za hG-CSF odstranili EcoRV mesto (oligo 
M20z108) zato, da smo si zagotovili moinost uvajanja toCkovnih (posameznih) 
mutacij z ollgonukleotidno usmerjeno mutagenezo v plazmidu pCytexAl-t.i-i s kitom 
Transformer™ Site-Directed Mutagenesis Kit (Ciontech). V plazmidu pCytexAH,H-G- 
CSF je bllo tako mogo5e uporabljati selekcijo mutant preko restrikcljskih mest 
EcoRI/EcoRV. 

Izhodni plazmid pCYTEXPI (Medac, Hamburg) smo preoblikovali tako, da je 
bila ekspresija konstitutivna, kar pomeni, da smo izrezali del gena za cl857 represor 



med obema restrikcijskima mestoma HIndlll- Plazmid, ki smo ga tako dobili, smo 
imenovali pCytexAH,H. 

Oligonukleotid za odstranitev EcoRV mesta iz nativnega gena za hG-CSF: 
M20z1 08 5* -CCT GGA AGG AAT ATC CCC CG-3' 

Primer 1b: Qptimizaciia kodonov (Slika 1^ 

V prvi stopnji optimizacije smo pripravili sintetski del gena med restrikcijskima 
mestoma Ndel in Sad z iepljenjem petih kaset (A, B, C, D,E), ki so bile sestavljene iz 
komplementamih oligonukieotidov. Ta sintetski del gena predstavija segment i. S 
segmentom I smo nato nadomestili del nativnega hG-CSF gena med restrikcijskima 
mestoma Ndel in Sad. To smo naredili tako, da smo Izrezall prvi del gena med 
restrikdjskima mestoma Nde 1 in Sad in ga nadomestili s sintetsko pripravijeno 
kaseto. Postopek je potekal v dveii korakili. Najprej smo Hgirati kaseto A, ki se je 
prilepila ha Ndel mesto in kaseto E. ki se je prilepila na Sad mesto. Po 1 6 urah na 
16°C smo ligacijsko zmes oborili z etanolom, da smo odstranili presezek (neve2:aniii) 
oligonukieotidov ter nato v dnjgem koraku dodali srednji del ceiotne kasete (kaseta 
B, C In D) iz trail parov prediiodno zlepljenih komplementamili oligonukieotidov ter 
ponovno ligiraii 1 6 urna 1 6°C. 

V drugi stopnji optimizacije smo z usmerjeno oiigonukieotidno mutagenezo 
(TransformerTI\/i Site-Directed l\/lutagenesis Kit (Ciontech)) zamenjali dva, za E. coli 
najbolj kritiSna kodona, Arg148 in GlylSO, ki se natiajata v segmentu ill. 

V tretji stopnji optimizacije smo na podoben na5in kot v prvi, vendar brez vmesne 
etanoine precipitacije, pripravili segment IV, ki predstavija zadnji del gena med 
restrikcijskima mestoma Nhei in BamHI iz dveh parov komplementarnili 
oligonukieotidov (kaseta F in G). 

V 5etrti stopnji optimizacije smo z usmerjeno oiigonukieotidno mutagenezo 
(TransformerTI\/l Site-Directed IVIutagenesis Kit (Ciontecii)) zamenjali neugoden 
kodon za Ile96 (ATA->ATT) (segment II) ter vnesli restrikcijsko mesto Apa I (Giy101 
GGT-»-GGG), ki se nahaja na 3' koncu segmenta 11. 

Apa i mesto smo nato uporabili v peti stopnji optimizacije zato, da smo nativni gen 
med Sac I in Apa I ^menjall s sintetsko DNA (segment II). Ta sintetska DNA je 



sestavtjena iz treh parov komplementamih oiigonukieotidov (kaseta H, I in J), ki smo 
jo izvedli na podoben nadin kot v prvi stopnji s kasnej§im dodatkom kasete 1. 

1 .stopnja optimizacije: 

komplementami pari oiigonukieotidov (Nde I - Sac I; segment I na Sliki 1 ): 
Kaseta A: sestavljena iz komplementamili oiigonukieotidov zg1os1 in sp1os2: 
zglosi 5' TAT GAG ACC ACT GGG TCC AGO TTC TIC TCT GCC GCA AAG 3' 
sp1os2 5' GCA GAG AAG AAG CTG GAC CCA GTG GTG TCA 3' 

« 

Kaseta B: sestavijena iz komplementarnihi oiigonukieotidov zg2os3 in sp2os4: 
zg2os3 5' CTT TCT GTT GAA ATG TTT AGA ACA AGTTCG TAA AAT TCA AG 3' 
sp2os4 5' GAA CTT GTT CTA AAC ATT TCA ACA GAA AGC TTT GCG 3' 

Kaseta C: sestavljena iz komplementamih oiigonukieotidov zg3os5 In sp3os6: 
zgSosS 5' GTG ATG GTG CAG CTT TAC AAG AAA AAC TGT GTG 3' 
. 8p3os6 5* GTT TTT CTT GTA AAG CTG CAC CAT CAG CTT GAA TTT TAC 3' 

Kaseta D: sestavljena iz komplementamih oiigonukieotidov zg4os7 in sp4os8: 
zg4os7 5' CAA CTT ATA AAC TGT GTC ATC CAG AAG AAC TGG TTC TGT TAG 
3' 

8p4o88 5' CAG TTC TTC TGG ATG ACA CAG TTT ATA AGT TGC ACA CA 3' 

Kaseta E: sestavljena iz komplementamih oiigonukieotidov zg5os9 in sp5os10: 
zg5o89 5' GTC ATT CTC TGG GTA TTC CGT GGG CTC CTC TGA GCT 3" 
spSoslO 5' CAG AGG AGC CCA CGG AAT ACC CAG AGA ATG ACC TAA CAG 
AAC 3' 

2. stopnja optimizacije: oligonukleotidi za zamenjavo najbolj krltienih kodonov z 

usmeijeno oligonukleotidno mutagenezo 

zamenjava Arg 148 (CGG - CGT) in Gly 160 (GGA- GGT) 

m38os16 

6' CTC TGC TTT CCA GCG CCG TGC AGG TGG GGT CCT GGT TG 3* 



S.stopnja optimizaclje: komplementami pari oligonukleotldov (Nhe I - BamH I; 
segment IV na Sliki 1 ): 

Kaseta F: sestavljena iz komplementamih oligonukleotidov zg6os1 1 In sp6os12: 
zg6os11 5' CTA GCC ATC TGC AAT CTT TTC TGG AAG TTA G 3' 
sp608l2 5' ACG ATA GOT AAC TTC CAG AAA AGA TTG CAG ATG G 3' 

Kaseta G: sestavljena iz komplementamih oligonuWeotldov zg7os1 3 in sp7os14: 
zg7os13 5' CTA TCG TGT TCT GCG TCA TCT GGC TCA GCC GTG ATA AG 3* 
sp7os14 5' GAT CCT TAT CAC GGC TGA GCC AGA TGA CGC AGA AC 3' 

4. stopnja optlmizacije: oligonukleotidi za vpeljavo Apa I (GlylOl GGT ->GGG), in 
zamenjavo neugodnega kodona Ile96 z usmerjeno oligonuMeotidno mutagenezo 
vpeljava Apa I (GlylOl GGT ->GGG), in zamenjava lie 96 (ATA- ATT): 

Apalos15 

5' GCC CTG GAG GGG ATT TCC CCC GAG TTG GGG CCC ACC TTG GAC AC 3' 

5. stopnja optimizaclje: komplementami pari oligonukleotldov (Sac I - Apa l;segment 
11 na Sliki 1): 

Kaseta H: sestavljena Iz komplementamih oligonukleotidov zg8os 18 in sp8os19: 
zg8o8l8 5' CCT GTC CGA GCC AGG CGC TGC AGC TGG CAG GCT GCC TGA 
G3' 

spSoslQ 5' CCT GCC AGC TGC AGC GCC TGG CTC GGA CAG GAG CT 3' 

Kaseta I: sestavljena iz komplementamih oligonukleotidov zg9os20 in sp9os21 : 
zg9os20 5' CCA ACT GCA TAG CGG TCT GTT TCT GTA TCA GGG TCT GCT G 
3' 

sp9os21 5' CTG ATA CAG AAA CAG ACC GCT ATG CAG TTG GCT CAG GCA G 
3' 



Kaseta J: sestavljena iz komplementarnih ollgonukleotidov zg10os22 In sp10os23: 
zg10os22 5* CAG GCG CTG GAA GGC ATT TCC CCG GAA CTG GGG CC 3' 
8p10os23 5' CCA GTT CCG GGG AAA TGC CTT CCA GCG CCT GCA GCA GAC 
C3' 

Primer 2: Eksoresiia sintetskeaa aena za h G-CSF v £. coli 

Optimiran gen Fopt5 smo iz plazmida pCyAH.H Izrezali 8 restrikcijskima 
encimoma Ndel in BamHI ter ga subklonirall v konSni ekspresijski plazmid pET3a 
(Novagen, Madison USA) ter transfornnirall v produkcijskl sev E. coli Bl_21 (DE3). 
Kulture smo pripravill na.stresalnlku 24 ur na 160 rpm pri 25°C oziroma 15 ur 42°C: 

- V LBGIO/amplOO mediju (10 g/l tripton, 5g/l kvasni ekstrakt, 10 g/1 NaCI, 10 g/I 
glukoze, 100 mg/L ampicillna). Indukcljo smo izvedll z dodatkom IPTG v medlj do 
kon5ne koncentraclje 0.4 mM. 

Kulture smo pripravill na stresalniku 24 ur na 160 rpm pri 25 C: 

- V GYSP/amplOO mediju (20 g/l fiton, 5g/l kvasni ekstrakt, 10 g/l NaCI, 10 g/l 
glukoze, kovlne v sledovlh, 100 mg/L ampicillna). Indukcljo smo izvedii z 
dodatkom IPTG v medlj do kon5ne koncentraclje 0.4 mM. 

- V LYSP/amplOO mediju (20 g/l fiton. 5g/l kvasni ekstrakt, 10 g/l NaCI, 6 g/l 
glicerola, 4 g/l laktoze, kovine v sledovlh, 100 mg/L ampicillna). Indukcljo smo v 
tem primeru Izvedll z dodatkom laktoze v medij. 

Inokulum smo pripravill v LBG/amplOO mediju (10 g/L tripton, 5 gA. kvasni 
ekstrakt, 10 g/L NaCI. 2.5 g/L glukoze) in 100 mgA. ampicillna pri 25'^C, 160 rpm 
prekonodno. 

Za analizo smo centrifugirali po 8 ml kulture na 5000 rpm. Pelete smo nato 
resuspendlrali v 10 mM TrisHCl/pH=8.0 tako, da smo dodali 0.66 ml pufra 
preraSunano na 1 enoto ODeoonm. Take smo IzenaCiH koli5lno nanosov celokupnlh 
protelnov. ker kondnl ODeoonm kultur v navedenlh primerih niso bill enakl. Vzorce smo 
zmegali v razmerju 3:1 s 4x SDS - vzorCnim pufrom z DTT (pH=8.7) in segrevall 10 
minut na 95°C, odcentriaigirall ter tako pripravljene nanesli na gel. 

DeleS akumuliranega hG-CSF, kl se lzlo6a v obliki Inkluzijskih teles, so za nativni 
in optlmirani gen opisani v Tabeli 1 . 



Tabela 1: Primeijava nivojev akumulacije hG-CSF pri natlvnem in optimiranem genu 
(FoptS) 







delez (%) hG-CSF od celotnlh proteinov 


Ekspresijski 
sistem 


pogoji gojenja in 
indukcije 


nativen gen za hG- 
CSF 


optimiran gen Fopt5 




temperatura gojenja 


26° G 


420C 


otto r> 


pET3a/ 

£. CO// BL21 (DE3) 

• 


gojiSfie 

LBG10/amp100 
0.4 mM IPTG 


sled 


<1 % 


> 40 % 


pET3a / 

£. coli BL21 (DE3) 


goji§5e 

GYSP/amp100 


< 1 % 


<1 % 


> 52 % 


pET3a / 

E. coli BL21 (DE3) 


goji§fie 

LYSP/amp100 


<1 % 


<1 % 


> 52 % 



Navedene vrednosti za vsebnost hG-CSF so pri FoptS dobijene z denzitometriCno 
analizo SDS-PAGE geiov obarvanih s Gcx>massie brilliant blue (Slika 3a in Slika 4) 
oziroma v primem neoptimiranega gena po detekclji s protitelesi (Slika 3b). Relativnl 
6e\e± je bil pri oceni ekspreslje Fopt5 doloden s profiino analizo (program iVIolecuiar 
analyst; BioRad) gelov na aparatu Imaging densitometer Model GS670 (BioRad). 



Qpis DNA zaooredii 



<110> Lek fairmacevtska dru2ba d. d. 

<120> Sintetski gen za hutnani granulocitne kolonij 

stimulirajoCi dejavnik za ekspresijo v E. coli 
<160> 2 



<210> SEQ ID NO: 1 

<211> 525 baznih. parov 

<212> BNK 

<213> sintetsko zaporedje 

<220> gen 

<400> SEQ ID NO: 1 

** 

atgacaccac tgggtccagc ttcttctctg 
caagttcgta aaattcaagg tgatggtgca 
aaactgtgtc atccagaaga actggttctg 
cctctgagct cctgtccgag ccaggcgctg 
agcggtctgt ttctgtatca gggtctgctg 
gggcccacct tggacacact gcagctggac 
cagatggaag aactgggaat ggcccctgcc 
ttcgcctctg ctttccagcg ccgtgcaggt 
tttctggaag ttagctatcg tgttctgcgt 



ccgcaaagct ttctgttgaa atgtttagaa 60 

gatttacaag aaaaactgtg tgcaacttat 120 

ttaggtcatt ctctgggtat tccgtgggct 13 0 

cagctggcag gctgcctgag ccaaccgcat 240 

caggcgctgg aaggcatttc cccggaactg 300 

gtcgccgact ttgccaccac catctggcag 3 go 

ctgcagccca cccagggtgc catgccggcc 420 

ggggtcctgg ttgctagcca tctigcaatct 480 

catctggctc agccg 525 



<210> SEQ ID NO: 2 

<211> 528 baznih parov 

<212> DNA 

<213> sintetsko zaporedje 

<220> gen 
<400> SEQ ID NO: 2 

atgacaccac tgggtccagc ttcttctctg 
caagttcgta aaat:tcaagg tgatggtgca 
aaactgtgtc atccagaaga actggttctg 
cctctgagct cctgtccgag ccaggcgctg 
agcggtctgt ttctgtatca gggtctgctg 
gggcccacct tggacacact gcagctggac 
cagatggaag aactgggaat ggcccctgcc 
ttcgcctctg ctttccagcg ccgtgcaggt 
tttctggaag ttagctatcg tgttctgcgt 



ccgcaaagct ttctgttgaa atgtttagaa 60 

gctttacaag aaaaactgtg tgcaacttat 12 0 

ttaggtcatt ctctgggtat tccgtgggct 180 * 

cagctggcag gctgcctgag ccaactgcat 240 

caggcgctgg aaggcatttc cccggaactg 300 

gtcgccgact ttgccaccac catctggcag 360 

ctgcagccoa cccagggtgc. catgccggcc 420 

ggggtcctgg ttgctagcca tctgcaatct 480 

catctggotc agccgtga 528 



Lek farmacevtskaiiruzba d. d. 



Patentni zahtevki 

1. DNA zaporedje. ki kodira za hG-CSF, oznaCeno s tern, da vsebuje zaporedje 
nukieotidov po SEQ ID NO: 1 . 

2. DNA zaporedje, oznaSeno s tern, da vsebuje zaporedje nukieotidov iz skuplne, ki 
zajema deino zaporedje SEQ ID NO: 1 in nukleinske kisline, ki hibrldizirajo z 
zaporedjem po SEQ ID NO: 1 pri zaostrenih pogojih. 

3. Ekspresijski piazmid, oznaden s tern, da vsebuje DNA zaporedje po zahtevku 1 In 
plazmidni vektor. 

4. Ekspresijski piazmid, oznaCen s tern, da vsebuje DNA zaporedje po zalitevku 2 in 
plazmidni vektor. 



5. Ekspresijski piazmid po zahtevkih 3 in 4, oznaden s tern, da je plazmidni vektor 
izbran iz skupine pET vektoijev. 

6. Ekspresijski sistem za ekspresijo DNA zaporedja po zahtevku 1 , ozna5en s tern, 
da vsebuje DNA zaporedje, plazmidni vektor in produkcijski sev E. coli. 

7. Ekspresijski sistem za ekspresijo DNA zaporedja po zahtevku 2, oznacen s tem, 
da vsebuje DNA zaporedje, plazmidni vektor in produkcijski sev E. coli. 

8. Ekspresijski sistem po zahtevkih 6 in 7, oznaden s tem, da je plazmidni vektor 
izbran iz skupine pET vektorjev. 

9. Etepresijski sistem po zahtevkih 6 in 7, oznaCen s tem, da je produkcijski sev E. 
coli BL21 (DE3). 

10. Priprava DNA zaporedja po zahtevku 1, ozna£ena s tem, da vklju5uje postopke. 
ki so izbrani iz skupine, ki zajema: 

-zamenjavo nekaterih za E. coli neugodnih kodonov s kodoni, ki so za E. coli 
ugodni, 

-zamenjavo nekaterih GO bogatih regij z AT bogatimi regijami. 

in nadalje zajema popolnoma nespremenjen del nativnega zaporedja za hG-CSF. 

11. Priprava DNA zaporedja po zahtevku 10, ki nadalje ne vkljuCuje sprememb v 
podro5jih iz skupinei ki zajema: podro5ja iniciacije translacije, podro5ja 
vezavnega mesta za ribosom ter podro5ja v razdaiji med start kodonom in 
vezavnim mestom za ribosom, 

12. Ekspreslja DNA zaporedja po zahtevku 1 v £. co//. 

13. Ekspresija DNA zaporedja po zahtevku 2 v E. coU. 




tzvledek 

izum se nanaSa na sintetski gen za hG-CSF. ki omogofia ekspresijo v E. coll z 
nivojem ekspresije ve5 kot 52% rekombinantnega hG-CSF glede na oeiotne proteine 
po ekspresiji. 
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Slika2 

ATGACCCCCCTGGGCCCTGCCAGCTCCCTGCCCCAGAGCTTCCTGCTGAAGTG 

CTTAGAGCAAGTGAGGAAGATCCAGGGCGATGGCGCAGCGCTCCAGGAGAAGC 

TGTGTGCCACCTACAAGCTGTGCCACCCCGAGGAGCTGGTGCTGCTCGGACAC 

TCTCTGGGCATCCCCTGGGCTCCCCTGAGCTCCTGCCCCAGCCAGGCCCTGCA 

GCTGGCAGGCTGCTTGAGCCAACTCCATAGCGGCCTTTTCCTCTACCAGGGGC 

TCCTGCAGGCCCTGGAAGGGATATCCCCCGAGTTGGGTCCCACCTTGGACACA 

CTGCAGCTGGACGTCGCCGACTTTGCCACCACCATCTGGCAGCAGATGGAAGA 

ACTGGGAATGGCCCCTGCCCTGCAGCCCACCCAGGGTGCCATGCCGGCCTTCG 

CCTCTGCTTTCCAGCGCCGGGCAGGAGGGGTCCTGGTTGCTAGCCATCTGCAG 

AGCTTCCTGGAGGTGTCGTACCGCGTTCTACGCCACCTTGCGCAGCCC 

b) 

ATGACACCACTGGGTCCAGCTTCTTCTCTGCCGCAAAGCTTTCTGTTGAAATG 
TTTAGAACAAGTTCGTAAAATTCAAGGTGATGGTGCAGCTTTACAAGAAAAAC 
TGTGTGCAACTTATAAACTGTGTCATCCAGAAGAACTGGTTCTGTTAGGTCAT 
TCTCTGGGTATTCCGTGGGCTCCTCTGAGCTCCTGTCCGAGCCAGGCGCTGCA 
GCTGGCAGGCTGCCTGAGCCAACTGCATAGCGGTCT6TTTCTGTATCAGGGTC 
TGCTGCAGGCGCTGGAAGGCATTTCCCCGGAACTGGGGCCCACCTTGGACACA 
CTGCAGCTGGACGTCGCCGACTTTGCCACCACCS^TCTGGCAGCAGATGGAAGA 
ACTGGGAATGGCCCCTGCCCTGCAGCCCACCCAGGGTGCCATGCCGGCCTTCG 
CCTCTGCTTTCCAGCGCCGTGCAGGTGGGGTCCTGGTTGCTAGCCATCTGCAA 
TCTTTTCTGGAAGTTAGCTATCGTGTTCTGCGTCATCTGGCTCAGCCG 
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Title of the invention 

Synthetic gene coding for human granulocyte-colony stimulating factor for the 

expression in E. coll 

Field ofttie invention 

The present invention relates to synthetic gene coding for human granulocyte- 
colony stimulating factor (hG-CSF) which enables expression in E. coli with an of 
expression level being equal as or higher than 52% of the recombinant hG-CSF to 
the total proteins after expression. 

hG-CSF belongs to a family of stimulating factors which regulate the 
differentiation and proliferation of hematopoetic mammalian cells. They have a major 
role in the neutrophil formation and are therefore suitable for use in medicine in the 

field of hematology and oncology. 

Two forms of hG-CSF are currently available for clinical use on the market: 
lenograstim which is glycosylated and is obtained by the expression in mammalian 
cell line and filgrastim which is non-glycosylated and is obtained by the expression in 
the bacterium Escherichia coli (£. coli). 

Summary of the invention 

The essential feature of the present invention is that the use of synthetic gene 
coding for hG-CSF enables to attain a level of expression (accumulation) in £. coli 
being equal to or higher than 52% of recombinant hG-CSF regarding the total 
proteins in £. coli. The expression plasmid containing a strong T7 promoter is used 
for the expression. The synthetic gene coding for hG-CSF is constructed by using a 
complex combination of two methods which enable the constmction of optimized 
synthetic gene (coding for hG-CSF) for its expression in E. coli. The first method 
includes the replacement of some rare E. coli codons which are unfavorable for 
expression in E. coli with E. coli preference codons for which are more favorable for 
the expression in E. coli. The second method includes the replacement of some GC 
rich regions with AT rich regions. Some parts of the synthetic gene of the present 
invention are constructed by using one of the two methods, for some parts the 
combination of the two methods is used, whereas some parts of the gene are not 



changed. At the construction procedure of the synthetic gene coding for hG-CSF, 
which is also the subject of the present Invention, the non coding regions are not 
changed. This means that there are no modifications in either the translation initiation 
region (TIR) or in the ribosome binding site (RBS), or in the region between the start 
codon and RBS. 

Background of the invention 

The impact of several successive rare codons such as arginine codons 
(AGG/AGA; CGA), leucine codon (CTA), isoleucine codon (ATA) and proline codon 
(CCC), on the level of translation and consecutively on the decrease of the amount 
and quality of the expressed protein in £. coli are described In Kane JF, Current 
Opinion In Biotechnology, 6:494-500 (1995). There Is a similar impact of Individual 
rare codons if they occur in different parts of the gene. 

The GC rich regions also have Impact on the translatlonal efficiency in E. coli 
if a stable double stranded RNA is formed in the mRNA secondary structure. This 
impact is the highest when the GC rich regions of mRNA are found either in the RBS, 
or in the direct proximity of the RBS or also in the direct proximity of the start codon 
(Makrides SC, Microbiological Reviews, 60:512-538 (1996); Baneyx F, Current 
Opinion in Biotechnology, 10:411-421 (1999)). 

There are known several methods for the prediction of the secondary structure 
and calculating minimal free energy of individual RNA molecule which is supposed to 
be the basic rule for the most stable / most probable structure (SantaLucia J Jr and 
Turner DH, Biopolymers, 44:309-319 (1997)). The reliable algorithms for the 
prediction of the correct secondary stmcture are not known with the exception of 
some cases. There has been no evidence for the quantitative correlation with the 
expression level (Smit MH and van Duin JJ. Mol. Biol., 244, 144-150 (1994)). It is still 
impossible to predict the tertiary structures of RNA (Tinoco I and Bustamante C, J. 
Mol. Biol, 293:271-281 (1999)). 

The increase of the expression level after the optimization of DMA sequence in 
the TIR region, in the RBS region and in the region between the start codon and the 
RBS region is described in McCarthy JEG and Brimacombe R, Trends Genet 10:402- 



407 (1994). In this case the expression level increased due to more efficient 
translation initiation and its smooth continuation in the mRNA coding region. 

The production of adequate amounts of hG-CSF for perfomiing the in vitro 
biological studies by expression in E. coli is described in Souza LM et al, Science 
232:61-65 (1986) and in Zsebo KM et al, Immunobiology 172:175-184 (1986). The 
hG-CSF expression level was lower than 1 %. 

The patent US4810643 discloses the use of synthetic gene coding for hG-CSF 
which was first of all constructed on the basis of replacement of E. coli rare codons 
with the £. coli preference codons. The combination with thermoinducible phage 
lambda promoter led to the expression level of 3 to 5% of hG-CSF regarding the total 
cellular proteins. This level was not sufficient for the economical large-scale 

production of hG-CSF. 

8-10% accumulation of hG-CSF to total cellular proteins was reached by 
changing the first four codons in the 5' end region of hG-CSF as is described in 
Wingfield P et al, Biochem. J, 256:213-218 (1988). 

The expression of hG-CSF in £. coli with the yield up to 17% of hG-CSF to 
total cellular bacterial proteins is described in Devlin PE et al, Gene 65:13-22 (1988). 
Such yield was reached with partial optimization of DNA sequence in the 5' end of 
the G-CSF coding region (codons coding for the first four amino acids) whereby the 
GC region was replaced with AT region and a relatively strong lambda phage 
promoter was used. This expression level is not very high what leads to lower 
production yields and is less economical in the large-scale production. 

The use of synthetic gene and the expression level of about 30% are 
described in Kang SH et al. Biotechnology letters, 1 7(7):687-692 (1995). This level 
was attained by the introduction of E. coli preference codons, by the modifications in 
the TIR region and with the additional modifications of codon sets whereby the 3' end 
of the gene was not essentially changed. Thus, for attaining the stated expression 
level the changes of the gene in the TIR region were needed and the expression 
level did not exceed 30%. The patent US5840543 describes the synthetic gene 
coding for hG-CSF which was constructed by the introduction of AT rich regions at 
the 5' end of the gene and with the replacement of E. coli rare codons with E. coli 
preference codons. Under the control of the Trp promoter the expression with the 



yield of 11% hG-CSF to total cellular proteins was reached. On the other hand, the 
addition of leucine and threonine or their combination into the femientation medium 
(where the bacteria were cultivated) led to the accumulation of up to 35% of hG-CSF 
regarding total cellular proteins. Such expression level was therefore reached by the 
addition of amino acids into the fermentation medium what is an additional cost In the 
process for production of hG-CSF and is not economical for the industrial production. 
Only optimization of the gene coding for hG-CSF did not enable a higher expression 
level of hG-CSF. 

The highest accumulation of hG-CSF regarding total cellular proteins found in 
the prior art is described in v Jeong et al, Protein Expression and Purification 
23,:31 1-318 (2001) and is 48%. Such accumulation was obtained by the changes in 
the N-temiinal end and by the induction with 1 mM IPTG. 

In general, there are no reports on possible predictions of the expression level 
of native human genes in prokaryotic organisms, e.g. bacterium E. coli. The 
described expression levels are relatively low or difficult to detect even when the 
expression plasmlds with strong promoters, e.g. from lambda or 17 phage are used. 
From the prior art literature it can be gathered that many parameters (rare codons or 
their clustering; GC base pairs rich regions, unfavorable mRNA secondary structures, 
unstable mRNA) have an impact on the accumulation of a human protein in E. coli. 

Until now there has been no entirely developed rule known on how to combine 
the codons In order to obtain the secondary or tertiary mRNA stmctures which are 
optimal for expression. Although there exist some mathematical and structural 
models for predicting and thermodynamlcal stability of secondary stmctures, but they 
are too unreliable to predict the secondary structures. On the other hand, there are 
no such models for predicting the tertiary structures. These currently accessible 
models therefore do not enable the prediction of the impact of the codons on the 
expression level. 

There are no reports in either the patent or the scientific literature on the more 
efficient way for solving the problem of low expression level of the native gene coding 
for hG-CSF in £. coli. 
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Description of the invention 

It has been found that the problem with the low expression level of the native 
gene coding for hG-CSF in £. coll can be solved by the optimization of the native 
gene coding for hG-CSF leading to the construction of the synthetic gene coding for 
hG-CSF. In comparison with the data described in the art. surprisingly high 

expression level can be obtained. 

The term 'hG-CSF', as used herein, refers to human granulocyte-colony 
stimulating factor, comprising the recombinant hG-CSF obtained by the expression in 
E. coii. 

The synthetic gene encoding hG-CSF of the present invention was obtained 
I by introducing changes In the nucleotide sequence of the gene encoding the native 
hG-CSF. Thus the amino acid sequence was not changed and remained identical to 
the native hG-CSF. 

The present invention further comprises the expression of the synthetic gene 
in £. coll and the level of expression of the synthetic gene. 

The term 'expression level', as used herein, refers to the proportion of hG-CSF 
obtained after the heterologous expression of the gene encoding hG-CSF regarding 
the total cellular proteins after expression. 

The temi 'heterologous expression", as used herein, refers to the expression 
of the genes which are foreign to the organism in which the expression occurs. 

The temi 'homologous expression', as used herein, refers to the expression of 
% the genes which are proper to the organism in which the expression occurs. 

The term 'preference codons', as used herein, refers to the codons used by an 
individual organism (e.g. E. coli) for the production of most mRNA molecules. The 
organism uses these codons for expressing genes with high homologous expression. 

The temi 'rare codons' as used herein, refers to the codons used by an 
individual organism (e.g. E. coli) only for expressing genes with low expression level. 
These codons are rarely used in the organism (low homologous expression). 

The term 'GC rich regions', as used herein, refers to the regions in the gene 
where the bases guanine (G) and cytosine (C) prevail. 

The term 'AT rich regions', as used herein, refers to the regions in the gene, 
where the bases adenine and thymine prevail. 



The term 'synthetic gene', as used herein, refers to the gene which differs from 
the native gene only in the nucleotide sequence whereby the amino acid sequence 
remains unchanged. The synthetic gene is obtained by the techniques of the 
recombinant DNA technology. 

The temri 'native gene', as used herein, refers to a gene which is not modified 
by using the techniques of the recombinant DNA technology. 

The term 'segment', as used herein, refers to the parts of the genes which are 
bounded by single restriction sites on both ends. These sites serve as subcloning 
sites for the synthetically constructed parts of the gene. 

The tema 'segment 1', as used herein, refers to the 5' end of the gene encoding 
hG-CSF between the restriction sites Nde I (3) and Sac I (194), i.e. 191 bp long 
sequence which was de novo synthesized. 

The term 'segment 11', as used herein, refers to the part of the gene for hG- 
CSF between restriction sites Sac I (194) and Apa I (309). i.e. 115 bp long central 
part of the gene which was de novo synthesized. 

The term 'segment III', as used herein, refers to the part of the gene for hG- 
CSF between restriction sites Apa I (309) in Nhe I (467), i.e. 158 bp long part of the 
gene where the native DNA sequence for hG-CSF Is preserved with the exception of 

Arg148 and Gly150. 

The terni 'segment IV', as used herein, refers to the 3' terminal end of the 
gene encoding hG-CSF between restriction sites Nhe I (467) and BamH I (536), i.e. 
69 bp long temriinal part of the gene which was de novo synthesized. 

The synthetic gene encoding hG-CSF of the present invention is constructed 
by the combination of the following methods: 

• replacement of the E. coli rare codons with E. coll preference codons: in 
the segment II (between restriction sites Sac I (194) and Apa I (309)) and 
In the segment IV (between restriction sites Nhe I (467) and BamH 1(536)) 

• replacement of some GC rich regions with AT rich regions, whereat the 
rarest E. coli codons are replaced, but mostly not with the E. coli 
preference codons: in the segment I (between restriction sites Nde I (3) 
and Sad (194)). 



• completely unchanged native sequence of 46 codons (between Pro 102 
and Arg147) in the segment ill. 

• elimination of two E. coli rare codons (Arg148 and Gly150) at the terminal 
end of the segment III. 

Optimization of the gene coding for hG-CSF of the present invention does not 
include changes in the TIR, RBS and in the regions between the start codons and 
RBS. 

The synthetic gene of the present invention encoding hG-CSF enables 
expression of the constructed synthetic gene encoding hG-CSF with the expression 
level in E. coli equal to or higher than 52%. Furthermore, the expression level of 
about 55% or even about 60% can also be obtained. High expression level of the 
synthetic gene coding for hG-CSF of the present invention enables high yields of hG- 
CSF production, faster and simpler purification and isolation of heterologous hG- 
CSF, easier in-process control, and the whole production process is more 
economical. Therefore, the efficient production of hG-CSF In industrial scale is 
enabled. The produced hG-CSF is suitable for clinical use in medicine. 

The constaiction of the synthetic gene of the present invention begins with the 
initial preparation of the hG-CSF native gene and of the plasmids. Gene coding for 
native hG-CSF can be of human origin, but the same principle can be used for every 
gene which is homologous in the regions which comprise single restriction sites 
which are used for subcloning of de novo synthesized gene segments. The plasmid 
for mutagenesis was chosen according to its ability to be capable of enabling the 
successive introduction of point mutations. Selection of enrichment of the plasmids 
containing desired mutation was obtained by using an additional selection primer that 
changed unique restriction site EcoRI into EcoRV or vice-versa (Transformer™ Site- 
Directed Mutagenesis Kit (Clontech)). The gene and the plasmid are constructed in 
such a way that the introduction of point mutation by cassette mutagenesis is 
possible. 

After the initial preparation of native gene coding for hG-CSF and of plasmids 
the optimization of the native gene coding for hG-CSF is perfonned. This means that 
the synthetic gene coding for hG-CSF is constructed. The optimization begins with 
the division of the native gene coding for hG-CSF into four (I, II, III in IV) segments, 
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which are or will be separated with single restriction sites after the oligonucleotide 
mutagenesis and In the individual segments the changes are introduced. In some 
individual segments the changes in the gene sequence are introduced whereas in 
certain segments the gene is not changed (Figure 1). The obtained optimized 
synthetic gene coding for hG-CSF therefore consists of partially preserved native 
sequence (segment III) and of 5' and 3' coding regions which are synthesized de 
novo (segments I, II in IV). 

The changes in the individual segments: 
Segment I: Replacement of E. coli rare codons with E. coli preference codons and 
replacement of GC rich regions with AT rich regions 

Thr2 (ACC->ACA), Pro3 (CCC-^CCA), Gly5 (GGC-^GGT) Pro6 (GCT^CCA). Ala7 
(GCC->GCT), Sera (AGC->TCT), Ser9 (TCC^TCT), Pro11 (CCC->CCG), Gln12 
(CAG->CAA), Phe14 (TTC->TTT), Leu16 (CTC-^TTG), Lys17 (AAG -^AAA), Cys18 
(TGC->TGT), Glu20 (GAG^GAA), Val22 (GTG ^GTT), Arg23 (AGG->CGT), Lys24 
(AAG->AAA) Ile25 (ATC^ATT), Gln26 (CAG-»CAA), Gly27 (GGC->GGT), Gly29 
(GGC^GGT), Ala31 (GCG-»GCT), Leu32 (CTC-»TTA), Gln33 (GAG^CAA), Glu34 
(GAG-»GAA), Lys35 (AAG ^AAA), Ala38 (GCC->GCA), Thr39 (ACC^ACT), Tyr40 
(TAC^TAT), Lys41 (AAG^AAA), Cys43 (TGC^TGT), His44 (CAC->CAT), Pro45 
(CCG->CCA), Glu46 (GAG^GAA), Glu47 (GAG^GAA), Val49 (GTG->GTT), Leu51 
(CTC^TTA), Gly52 (GGA-^GGT), His53 (CAC-^CAT). Gly56 (GGC->GGT), Ile57 
(ATC^ATT), Pro58 (CGG->CCG), Pro61 (CCC-^CCT) 

Segment II: Replacement of £. coli rare codons with £. coli preference codons. 
Cys65 (TGC^TGT), Pro66(CCC->CCG), Ala69 (GCC->GCG), Leu76 (TTG-^CTG), 
Leu79 (CTC-»CTG), Gly82 (GGC^GGT), Leu83 (CTT-^CTG), Phe84 (TTC^TTT), 
Leu85 (CTC^CTG), Tyr86 (TAC^TAT), Gly88 (GGG->GGT), Leu89 (CTC-»CTG), 
Ala92 (GCC->GCG), Gly95 (GGG-»GGC), Ile96 (ATA-^ATT), Pro98 (OCC^CCG), 
Glu99 (GAG-)-GAA), LeulOO (TTG-»CTG). Gly101 (GGT->GGG) 

Segment III: Replacement of two E. coli rare codons situated just before the 
restriction site Nhel 



Arg 148 (CGG ^CGT), Gly150 (GGA->GGT) 



Segment IV: Replacement of a long cluster of E. coli rare codons at the 
terminal end of the gene with E. coli preference codons. 

Gln159 (CAG^CAA), Ser160 (AGG->TCT), Phe161 (TTC->TTT), Glu163 
(GAG-»GAA), Van 64 (GTG^GTT), Ser165 (TGG->AGC), Tyr166 (TAC-^TAT), 
Arg167 (GGC-^CGT), Leu 169 (GTA->CTG), Arg170 (CGC-^GGT), His171 
(CAC^GAT), Leu172 (GTT^GTG), Ala173 (GGG-»GGT), Pro175 (GGG^CGG) 

After the construction of the synthetic gene coding for hG-CSF the optimized 
synthetic gene is subcloned in the final plasmid vector which is selected from the 
group of pET vectors (Novagen). These vectors contain a strong T7 promoter. 
Preferably the plasmid vector pET3a is used. The expression plasmid which is 
thereby constructed Is then transformed Into the production strain which is selected 
from the group of strains which carry a chromosomal record for T7 RNA polymerase. 
Most preferably, E. co// BL21 (DBS) is used. 

The procedure is continued with the preparation of inoculum and with the 
fermentation process. The fermentation can be performed at 37°G, but is preferably 
performed at 25°G. 

The accumulated heterologous hG-CSF is found in the inclusion bodies and is 
suitable for the renaturation process and use in the Isolation procedures. 

Description of the drawings: 

Figure 1 : The scheme of the optimization steps of gene coding for hG-CSF 

Figure 2: a) DNA sequence of the native gene coding for hG-CSF (GenBank: 

NM_000759) 

b) DNA sequence of the optimized (FoptS) gene coding for hG-CSF. The 

bases which differ from native gene are bolded. 
Figure 3: a) SDS-PAGE (4 % stacking, 15 % separating; stained with Coomassie 

brilliant blue) of the samples of the proteins from the induced and 
noninduced cultures of production strains E. co// BL21 (DE3) with the 
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expression plasmid pET3a at 25° C and 42° C. The cultures were 
cultivated in the LBG1 0/amp1 00 medium. 

Legend: 

Load 1 : BL21 (DE3) pET3a-hG-CSF non-induced at 25°C (1 0 yA) (no traces of hG- 
CSF) 

Load 2: BL21(DE3) pET3a-hG-CSF induced with IPTG at 25°C (10 ^il) (slight 
tracehG-CSF) 

Load 3: BL21 (DE3) pET3a-hG-CSF non-induced at42°C (10 (no traces hG-CSF) 
Load 4: BL21 (DE3) pET3a-hG-CSF induced with IPTG at 42°C (10 (under 1 % 
hG-CSF) 

Load 5: standard filgrastim 0.3 \xq for Coomassie brilliant blue 
Load 6: BL21 (DE3) pET3a-Fopt5 non-induced at 25°C (5 jil) (6 % hG-CSF) 
Load 7: BL21 (DE3) pET3a-Fopt5 induced with IPTG at 25°C (5 nl) (over 50% hG- 
CSF) 

b) detection with antibodies (Western blot); primary rabbit antibodies; 
secondary goat anti-rabbit IgG antibodies conjugated with horseradish peroxidase, 
substrate p-naphthol 

The samples for the detection with antibodies were loaded in the same amount and 
in the same sequence as at SDS-PAGE (Figure 3a) with the exception of the 
standard which load was 0.08 \xg. 

Figure 4: SDS-PAGE (4 % stacking, 15 % separating; stained with Coomassie 

brilliant blue) samples of proteins from Induced culture of the production 
strain E. coli BL21 (DE3) with the expression plasmid pET3a at 25° C. 
The cultures were cultivated in GYSP/amp100 and LYSP/amplOO 
medium. 

Legend: 

Load 1 : LMW (BioRad) 

Load 2: BL21 (DE3) pET3a/P-Fopt5, the culture cultivated in LYSP/amplOO; (60% 
hG-CSF) 

Load 3: BL21 (DE3) pET3a/P-Fopt5, the culture cultivated in LYSP/amplOO; (over 
54% hG-CSF) 
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Load 4: rhG-CSF (0.6 \iq) 
Load 5: rhG-CSF (1 .5 ng) 

Load 6: BL21 (DE3) pET3a/P-Fopt5, the culture cultivated in GYSP/amplOO (4 m-I); 
(55% hG-CSI=) 

Load 7: BL21 (DE3) pET3a/P-Fopt5, the culture cultivated in GYSP/amplOO (5^1); 
(52% hG-CSF) 

Examples: 

Example 1: Construction of the optimal gene: FoDt5 
Example la: The initial oene and olasmid preparations 

The gene coding for hG-CSF was amplified from BBG13 (R&D) with the PGR 
method, which was also used to introduce by using the start oligonucleotides the 
restriction sites Ndel in BamHI at the start and temiinal end of the gene. The gene 
was then incorporated in the plasmid pCytexAH.H (see the description below) 
between the restriction sites Ndel in BamHI. All other optimization steps for the 
expression of the gene in E. coli were also performed in this plasmid. 

During the initial gene preparation the EcoRV site was deleted (oligo 
M20z108) from the gene. This was performed with the aim to ensure the possibility of 
introduction of (individual) mutations by using the oligonucleotide-directed 
mutagenesis in the plasmid pCytexAH.H with the i^it Transfomner^'^ Site-Directed 
Mutagenesis Kit (Clontech). The selection of mutants in the plasmid pCytexAH,H-G- 
CSF via the restriction sites EcoRI/EcoRV was therefore possible. 

The starting plasmid pCYTEXPI (Medac, Hamburg) was reconstructed in a 
way to enable the constitutive expression. This was performed by the excision of the 
part of the gene coding for cl857 repressor between both restriction sites Hindlll. The 
obtained plasmid was named pCytexAH.H. 

The oligonucleotide for the deletion of EcoRV site from the gene coding for hG-CSF: 
M20z1 08 5' -CCT GGA AGG AAT ATC CCC CG-3' 
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Example 1b: Codon optimization (Figure 1) 

In the first optimization step the synthetic gene between the restriction sites 
Ndel in Sad was constructed by ligation of five cassettes (A, B, C, D, E) which were 
composed of complementary oligonucleotides. This synthetic part of the gene 
represents the segment 1. With the segment I the part of the native gene for hG-CSF 
gene between the restriction sites Ndel in Sad was replaced. This was performed by 
the excision of the first part of the gene between the restriction sites Nde I and Sad 
and its replacement with the synthetically prepared cassette. The process was 
performed in two steps. In the first step, the cassette A was ligated to the Ndel site 
and the cassette E was ligated to the Sad site. After 16 hours at 16°C the ligation 
mixture was precipitated with ethanol to remove the excess of (not bound) 
oligonucleotides. In the second steps the central part of the whole cassette (cassette 
B, C and D) from the three previously ligated complementary nucleotides was added 
and the ligation was performed for 16 hours at 16°C. 

In the second optimization step the two t for E. coli most critical codons located in the 
segment 111, namely, Arg148 in Gly150, were replaced by using the oligonucleotide- 
directed mutagenesis (TransformerTM Site-Directed Mutagenesis Kit (Clontech)). 
In the third optimization step the segment IV was constructed in a similar way as the 
segment I with the exception of intermediate ethanol precipitation. The segment IV 
represents the last part of the gene between the restrictions sites Nhel and BamHI 
and is composed of two pairs of complementary oligonucleotides (cassettes F in G). 
In the fourth step of optimization the rare codon coding for lle96 was replaced 
(ATA-->ATT) (segment II) by using the oligonucleotide-directed mutagenesis 
(TransformerTM Site-Directed Mutagenesis Kit (Clontech)) and the restriction site for 
Apa I (GlylOl GGT-^GGG) was introduced at the 3' end of the segment 11. 
Apa I restriction site was then used in the fifth optimization step with the aim to 
replace the native gene between Sad and Apal with the synthetic DNA (segment 11). 
This synthetic DNA is composed of three pairs of complementary oligonucleotides 
(cassette H, 1 and J). This was performed similarly as in the first step with the later 
addition of the cassette I. 
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1^ optimization step: 

complementary pairs of oligonucleotides (Nde 1 - Sac I; segment I in Figure 1): 
Cassette A: composed of complementary oligonucleotides zglosi in sp1os2: 
zglosi 5' TAT GAG ACC ACT GGG TCC AGC TTC TTC TCT GCC GCA AAG 3' 
sp1os2 5' GCA GAG AAG AAG CTG GAC CCA GTG GTG TCA 3' 

Cassette B: composed of complementary oligonucleotides zg2os3 in sp2os4: 
zg2os3 5' CTT TCT GTT GAA ATG TTT AGA ACA AGTTCG TAA AAT TCA AG 3' 
sp2os4 5' GAA CTT GTT CTA AAC ATT TCA ACA GAA AGC TTT GCG 3* 

Cassette C: composed of complementary oligonucleotides zg3os5 in sp3os6: 
zg3os5 5' GTG ATG GTG CAG CTT TAC AAG AAA AAC TGT GTG 3' 
sp3os6 5' GTT TTT CTT GTA AAG CTG CAC CAT CAC CTT GAA TTT TAC 3' 

Cassette D: composed of complementary oligonucleotides zg4os7 in sp4os8: 
zg4os7 5- CAA CTT ATA AAC TGT GTC ATC CAG AAG AAC TGG TTC TGT TAG 
3' 

sp4os8 5' CAG TTC TTC TGG ATG ACA CAG TTT ATA AGT TGC ACA CA 3' 

Cassette E: composed of complementary oligonucleotides zg5os9 in sp5os10: 
zg5os9 5' GTC ATT CTC TGG GTA TTC CGT GGG CTC CTC TGA GCT 3' 
• spSoslO 5' CAG AGG AGC CCA CGG AAT ACC CAG AGA ATG ACC TAA CAG 
AAC 3' 

2"** optimization step: oligonucleotides for the replacement of the most critical codons 
by using the oligonucleotide-dlrected mutagenesis 
replacement Arg 148 (CGG - CGT) in Gly 1 50 (GGA - GGT) 
m38os16 

5- CTC TGC TTT CCA GCG CCG TGC AGG TGG GGT CCT GGT TG 3' 

3"" optimization step: complementary pairs of nucleotides (Nhe I - BamH I; segment 
IV on Figure 1 ): 
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Cassette F: composed of complementary nucleotides zg6os1 1 In sp6os12: 
zg6os1 1 5' CTA GCC ATC TGC AAT CTT TTC TGG AAG TTA G 3* 
sp6os12 5' ACG ATA GOT AAC TTC CAG AAA AG A TTG CAG ATG G 3' 

Cassette G: composed of complementary oligonucleotides zg7os13 in sp7os14: 
zg7os1 3 5* CTA TCG TGT TCT GCG TCA TCT GGC TCA GCC GTG ATA AG 3' 
sp7os14 5" GAT CCT TAT CAC GGC TGA GCC AGA TGA CGC AGA AC 3' 

4*^ optimization step: oligonucleotides for the introduction of Apa i (Gly101 GGT 
-»GGG), and the replacement of the rare codon lle96 by using the oligonucleotide- 
directed mutagenesis 

insertion of Apa I (GlylOl GGT ^GGG), and replacement lie 96 (ATA - ATT): 
Apalosi 5 

5" GCC CTG GAG GGG ATT TCC CCC GAG TTG GGG CCC ACC TTG GAC AC 3' 

5. optimization step: complementary pairs of oligonucleotides (Sac I - Apa I; segment 
II in Figure 1): 

Cassette H: composed of complementary oligonucleotides zg8os18 in sp8os19: 
zg8os18 5' CCT GTC CGA GCC AGG CGC TGC AGC TGG CAG GCT GCC TGA 
G 3' 

sp8os19 5' CCT GCC AGC TGC AGC GCC TGG CTC GGA CAG GAG CT 3' 

Cassette I: composed of complementary oligonucleotides zg9os20 in sp9os21: 
zg9os20 5* CCA ACT GCA TAG CGG TCT GTT TCT GTA TCA GGG TCT GCT G 
3' 

sp9os21 5' CTG ATA CAG AAA CAG ACC GCT ATG CAG TTG GCT CAG GCA G 
3' 

Cassette J: composed of complementary oligonucleotides zg10os22 in sp10os23: 
zg10os22 5' CAG GCG CTG GAA GGC ATT TCC CCG GAA CTG GGG CC 3' 
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spl 0os23 5' CCA GTT CCG GGG AAA TGC CTT CCA GCG CCT GCA GCA GAC 
C3' 

Example 2: Expression of the synthetic aene coding for hG-CSF in E. coli 

The optimized gene FoptS was excised from the plasmid pCyAH.H with the 
restriction enzymes Ndel in BamHI and the gene was then subdoned in the final 
expression plasmid pET3a (Novagen. Madison USA) which was then transfonned 
into the production strain E. coli BL21 (DE3). 

The cultures were prepared on a shaker for 24 hours at 160 rpm at 25°C or 15 hours 
at 42°C: 

- in LBGIO/amplOO medium (10 g/l tryptone. 5g/l yeast extract. 10 g/l NaCI, 10 g/l 
glucose, 100 mg/L ampicillin). The induction was perfomned with the addition of 
IPTG to the final concentration of 0.4 mM. 

The cultures were prepared on a shaker for 24 hours at 1 60 rpm at 25°C: 

- in GYSP/amplOO medium (20 g/l phytone, 5g/l yeast extract, 10 g/l NaCI, 10 g/l 
glucose, metals in traces, 100 mg/L ampicillin). The induction was performed with 
the addition of IPTG into the medium to the final concentration of 0.4 mM. 

- in LYSP/amplOO medium (20 g/l phytone, 5g/l yeast extract, 10 g/l NaCI, 6 g/l 
glycerol, 4 g/l lactose, metals in traces, 100 mg/L ampicillin). The induction was 
performed with the addition of lactose into the medium. 

The inoculum was prepared in LBG/amp100 medium (10 g/L tryptone, 5 g/L 
yeast extract, 10 g/L NaCI, 2.5 g/L glucose) and 100 mg/L ampicillin at 25°C, 160 rpm 
overnight. 

For analysis 8 ml of the culture was centrifuged at 5000 rpm. The pellets were 
then resuspended in 10 mM TrlsHCl/pH=8.0 in a proportion of 6.66 ml buffer added 
to calculated 1 unit ODeoonm- The loaded amounts were thereby equalized. Namely, 
the final ODeoonm of the cultures In the stated examples were not equal. The samples 
were mixed in the proportion of 3:1 with 4x SDS - sample buffer with DTT (pH=8.7) 
and heated 10 minutes at 95°C, centrifuged and loaded onto he gel. 

The proportions of accumulated hG-CSF found in the fomi of inclusion bodies for 
the native and optimized gene are described in Table 1 . 
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Table 1: The comparison of the accumulation levels ofhG-CSF for the native and the 
optimized gene (Fopt5) 







proportion (%) hG-CSF to total proteins 


Expression system 


cultivation and 
induction conditions 


native gene coding 
for hG-CSF 


optimized gene 
Foots 




cultivation 
temperature 


25° C 


42° C 


25° C 


pET3a / 

£. coll BL21 (DE3) 


medium 
LBG10/amp100 
0.4 mM IPTG 


traces 


< 1 % 


> 40 % 


pET3a / 

E. coli BL21 (DE3) 


medium 
GYSP/amp100 


< 1 % 


< 1 % 


> 52 % 


pET3a / 

E. coll BL21 (DE3) 


medium 
LYSP/amp100 


<1 % 


< 1 % 


> 52 % 



The indicated values for hG-CSF contents are obtained by the densltometric analysis 
of SDS-PAGE gels stained with Coomassie brilliant blue In the case of Fopt5 (Figure 
3a in Figure 4) and by using the detection with antibodies in the case of unoptlmized 
gene (Figure 3b). In the case of Fopt5 the relative proportion by the estimation of 
expression was determined with the profile analysis (program Molecular analyst; 
BloRad) of the gels by using the apparatus Imaging densitometer Model GS670 
(BloRad). 
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Description of DMA sequences 



<110> Lek Pharmaceuticals d. d. 

<120> Synthetic gene coding for human granulocyte -colony 

stimulating factor for the expression in B. coli 
<160> 2 



<210> SEQ ID NO: 1 

<211> 525 base pairs 

<212> DNA 

<213> synthetic sequence 

<220> gene 



<400> 


SEQ ID NO 


: 1 










atgacaccac 


tgggtccagc 


ttcttctctg 


ccgcaaagct 


ttctgttgaa 


atgtttagaa 


60 


caagttcgta 


aaattcaagg 


tgatggtgca 


gctttacaag 


aaaaactgtg 


tgcaacttat 


120 


aaactgtgtc 


atccagaaga 


actggttctg 


ttaggtcatt 


ctctgggtat 


tccgtgggct 


180 


cctctgagct 


cctgtccgag 


ccaggcgctg 


cagctggcag 


gctgcctgag 


ccaactgeat 


240 


agcggtctgt 


ttctgtatca 


gggtctgctg 


caggcgctgg 


aaggcatttc 


ceeggaactg 


300 


gggcccacct 


tggacacact 


gcagctggac 


gtcgccgact 


ttgccaccac 


catctggcag 


360 


cagatggaag 


aactgggaat 


ggcccctgcc 


ctgcagccca 


cccagggtgc 


catgccggee 


420 


ttcgcctctg 


ctttccagcg 


ccgtgcaggt 


ggggtcctgg 


ttgctagcca 


tctgcaatct 


480 


tttctggaag 


ttagctatcg 


tgttctgcgt 


catctggctc 


age eg 




525 
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<210> 


SEQ ID NO 


: 2 


<211> 


528 base pairs 


<212> 


DNA 






synthetic 


sequence 


^ O O A -.^ 


gene 




<400> 


SEQ ID NO 


: 2 


atgacaccac 


tgggtccagc 


ttcttctctg 


caagttcgta 


aaattcaagg 


tgatggtgca 


aaactgtgtc 


atccagaaga 


actggttctg 


cctctgagct 


cctgtccgag 


ccaggcgctg 


agcggtctgt 


ttctgtatca gggtctgctg 


gggcccacct 


tggacacact 


gcagctggac 


cagatggaag 


aactgggaat 


ggcccctgcc 


ttcgcctctg 


ctttccagcg 


ccgtgcaggt 


tttctggaag 


ttagctatcg 


tgttctgcgt 



ccgcaaagct ttctgttgaa atgtttagaa 60 

gctttacaag aaaaactgtg tgcaacttat 120 

ttaggtcatt ctctgggtat tccgtgggct 180 

cagctggcag gctgcctgag ccaactgcat 240 

caggcgctgg aaggcatttc cccggaactg 30 0 

gtcgccgact ttgccaccac catctggcag 3 60 

ctgcagccca cccagggtgc catgccggcc 420 

ggggtcctgg ttgctagcca tctgcaatct 480 

catctggctc agccgtga 528 



Lek Pharmaceuticals d.d 
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Patent claims 

1. A DNA sequence coding for hG-CSF characterized in tliat tlie sequence 
comprises tlie nucleotide sequence of SEQ ID NO: 1 . 

2. A DNA sequence characterized in that the sequence comprises a nucleotide 
sequence selected from the group comprising a partial sequence of SEQ ID NO: 

1 and nucleic acids which hybridize with the sequence of SEQ ID NO: 1 under 
stringent conditions. 

3. An expression plasmid characterized in that the plasmid comprises the DNA 
sequence according to claim 1 and a plasmid vector 

4. An expression plasmid characterized in that the plasmid comprises a DNA 
sequence according to claim 2 and a plasmid vector. 

5. The expression plasmid according to claims 2 and 4 characterized in that the 
plasmid vector Is selected from the group of pET vectors. 

6. An expression system for the expression of DNA sequence according to claim 1 
characterized in that the system comprises the DNA sequence, a plasmid vector 
and a production strain £. coli. 

7. An expression system for the expression of the DNA sequence according to claim 

2 characterized In that the system comprises the DNA sequence, a plasmid 
vector and a production strain E. coli. 

8. The expression system according to claims 6 and 7 characterized in that the 
plasmid vector is selected from the group of pET vectors. 

9. The expression system according to claims 6 in 7 characterized in that the 
production strain is E. coli BL21 (DE3). 

10. A process for constmction of DNA sequence according to claim 1 characterized in 
that the process comprises the methods selected from the group comprising: 
-replacement of some E. coli rare codons with £. coll preference codons, 
-replacement of some GC rich regions with AT rich regions 

and further comprises a completely unchanged part of the native sequence 
coding for hG-CSF. 

11. A process for construction of DNA sequence according to claim 10 characterized 
in that the process does not involve changes in the regions from the group 
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comprising: translation initiation region, ribosome binding site and the region 

between the start codon and the ribosome binding site. 
1 2. Expression of DNA sequence according to claim 1 in E. colL 
13- Expression of DNA sequence according to claim 2 in E. colL 
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Abstract 

The invention relates to tlie synthetic gene coding for hG-CSF which enables 
expression in £. coli with the expression level being more than 52% of the 
recombinant hG-CSF regarding the total cellular proteins after expression. 





1/4 
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Figure 2 

a) 

ATGACCCCCCTGGGCCCTGCCAGCTCCCTGCCCCAGAGCTTCCTGCTCAAGTG 

CTTAGAGCAAGTGAGGAAGATCCAGGGCGATGGCGCAGCGCTCCAGGAGAAGC 

TGTGTGCCACCTACAAGCTGTGCCACCCCGAGGAGCTGGTGCTGCTCGGACAC 

TCTCTGGGCATCCCCTGGGCTCCCCTGAGCTCCTGCCCCAGCCAGGCCCTGCA 

GCTGGCAGGCTGCTTGAGCCAACTCCATAGCGGCCTTTTCCTCTACCAGGGGC 

TCCTGCAGGCCCTGGAAGGGATATCCCCCGAGTTGGGTCCCACCTTGGACACA 

CTGCAGCTGGACGTCGCCGACTTTGCCACCACCATCTGGCAGCAGATGGAAGA 

ACTGGGAATGGCCCCTGCCCTGCAGCCCACCCAGGGTGCCATGCCGGCCTTCG 

CCTCTGCTTTCCAGCGCCGGGCAGGAGGGGTCCTGGTTGCTAGCCATCTGCAG 

AGCTTCCTGGAGGTGTCGTACCGCGTTCTACGCCACCTTGCGCAGCCC 

b) 

ATGACACCACTGGGTCCAGCTTCTTCTCTGCCGCAAAGCTTTCTGTTGAAATG 
TTTAGAACAAGTTCGTAAAATTCAAGGTGATGGTGCAGCTTTACAAGAAAAAC 
TGTGTGCAACTTATAAACTGTGTCATCCAGAAGAACTGGTTCTGTTAGGTCAT 
TCTCTGGGTATTCCGTGGGCTCCTCTGAGCTCCTGTCCGAGCCAGGCGCTGCA 
GCTGGCAGGCTGCCTGAGCCAACTGCATAGCGGTCTGTTTCTGTATCAGGGTC 
TGCTGCAGGCGCTGGAAGGCATTTCCCCGGAACTGGGGCCCACCTTGGACACA 
CTGCAGCTGGACGTCGCCGACTTTGCCACCACCATCTGGCAGCAGATGGAAGA 
ACTGGGAATGGCCCCTGCCCTGCAGCCCACCCAGGGTGCCATGCCGGCCTTCG 
CCTCTGCTTTCCAGCGCCGTGCAGGTGGGGTCCTGGTTGCTAGCCATCTGCAA 
TCTTTTCTGGAAGTTAGCTATCGTGTTCTGCGTCATCTGGCTCAGCCG 
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Figure 3 
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